A MARKOV DECISION PROCESS WITH NON-STATIONARY TRANSITION LAWS
نویسندگان
چکیده
منابع مشابه
Genetic Distance for a General Non-Stationary Markov Substitution Process
The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous...
متن کاملLinear programming formulation for non-stationary, finite-horizon Markov decision process models
Linear programming (LP) formulations are often employed to solve stationary, infinitehorizon Markov decision process (MDP) models. We present an LP approach to solving nonstationary, finite-horizon MDP models that can potentially overcome the computational challenges of standard MDP solution procedures. Specifically, we establish the existence of an LP formulation for risk-neutral MDP models wh...
متن کاملLearning in non-stationary Partially Observable Markov Decision Processes
We study the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not perfectly known and may change over time. We present the algorithm MEDUSA+, which incrementally improves a POMDP model using selected queries, while still optimizing the reward. Empirical results show the response of the algorithm to changes in the parameters of a m...
متن کاملA generalized Markov decision process
— In this paper we present a generalized Markov décision process that subsumes the traditional discounted, infinité horizon, finite state and action Markov décision process, VeinotCs discountéd décision processes, and Koehler's generalization of these two problem classes. Résumé. — Nous présentons dans cet article un processus de Markov généralisé qui englobe le processus de décision markovien ...
متن کاملOn the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error ǫ at each iteration, it is well-known that one can compute stationary policies that are 2γ (1−γ)2 ǫ-optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of Mathematical Statistics
سال: 1968
ISSN: 0007-4993
DOI: 10.5109/13030